14 research outputs found
Optimization of Regular Path Queries in Graph Databases
Regular path queries offer a powerful navigational mechanism in graph databases. Recently, there has been renewed interest in such queries in the context of the Semantic Web. The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. While eminently useful, such queries are difficult to optimize and evaluate efficiently, however. We design and implement a cost-based optimizer we call Waveguide for SPARQL queries with property paths. Waveguide builds a query planwhich we call a waveplan (WP)which guides the query evaluation. There are numerous choices in the con- struction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude with the best plan often offering excellent performance. A WPs costs can be estimated, which opens the way to cost-based optimization. We demonstrate that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant. We analyze the effective plan space which is enabled by Waveguide and design an efficient enumerator for it. We implement a pro- totype of a Waveguide cost-based optimizer on top of an open-source relational RDF store. Finally, we perform a comprehensive performance study of the state of the art for evaluation of SPARQL property paths and demonstrate the significant performance gains that Waveguide offers
GGDs: Graph Generating Dependencies
We propose Graph Generating Dependencies (GGDs), a new class of dependencies
for property graphs. Extending the expressivity of state of the art constraint
languages, GGDs can express both tuple- and equality-generating dependencies on
property graphs, both of which find broad application in graph data management.
We provide the formal definition of GGDs, analyze the validation problem for
GGDs, and demonstrate the practical utility of GGDs.Comment: 5 page
The Future is Big Graphs! A Community View on Graph Processing Systems
Graphs are by nature unifying abstractions that can leverage
interconnectedness to represent, explore, predict, and explain real- and
digital-world phenomena. Although real users and consumers of graph instances
and graph workloads understand these abstractions, future problems will require
new abstractions and systems. What needs to happen in the next decade for big
graph processing to continue to succeed?Comment: 12 pages, 3 figures, collaboration between the large-scale systems
and data management communities, work started at the Dagstuhl Seminar 19491
on Big Graph Processing Systems, to be published in the Communications of the
AC
GGDs : Graph Generating Dependencies
We propose Graph Generating Dependencies (GGDs), a new class of dependencies for property graphs. Extending the expressivity of state of the art constraint languages, GGDs can express both tuple- and equality-generating dependencies on property graphs, both of which find broad application in graph data management. We provide the formal definition of GGDs, analyze the validation problem for GGDs, and demonstrate the practical utility of GGDs
A General Cardinality Estimation Framework for Subgraph Matching in Property Graphs
We introduce a framework for cardinality estimation of query patterns over property graph databases. This framework makes it possible to analyze, compare and combine different cardinality estimation approaches. It consists of three phases: obtaining a set of estimates for some subqueries, extending this set and finally combining the set into a single cardinality estimate for the query. We show that (parts of) many existing cardinality estimation approaches can be used as techniques in one of the phases from our framework. The phases are loosely coupled, making it possible to combine (parts of) current cardinality estimation approaches. We created a graph version of the Join Order Benchmark to perform experiments with different combinations of techniques. The results showed that query patterns without property constraints can be accurately estimated using synopses for small patterns. Accurate estimation of query patterns with property constraints require new estimation techniques to be developed that capture correlations between the property constraints and the topology in graph databases
Scalable temporal clique enumeration
We study the problem of enumeration of all k-sized subsets of temporal events that mutually overlap at some point in a query time window. This problem arises in many application domains, e.g., in social networks, life sciences, smart cities, telecommunications, and others. We propose a start time index (STI) approach that overcomes the efficiency bottlenecks of current methods which are based on 2-way join algorithms to enumerate temporal k-cliques. Additionally, we investigate how precomputed checkpoints can be used to further improve the efficiency of STI. Our experimental results demonstrate that STI outperforms the state of the art by a wide margin and that our checkpointing strategies are effective
Querying Graphs
International audienceGraph data modeling and querying arises in many practical application domains such as social and biological networks where the primary focus is on concepts and their relationships and the rich patterns in these complex webs of interconnectivity. In this book, we present a concise unified view on the basic challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, we present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the pre-dominant data model adopted by modern graph database systems.We aim especially to give a coherent and in-depth perspective on current graph querying and an outlook for future developments. Our presentation is self-contained, covering the relevant topics from: graph data models, graph query languages and graph query specification, graph constraints, and graph query processing. We conclude by indicating major open research challenges towards the next generation of graph data management systems